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Abstract — We consider the problem of compression of two 
memoryless binary sources, the correlation between which is 
denned by a Hidden Markov Model (HMM). We propose a 
Decision Feedback (DF) based scheme which when used with 
low density parity check codes results in compression close to 
the Slepian Wolf limits. 



I. INTRODUCTION 

Consider the classical Slepian Wolf set up where two 
correlated sources X and Y have to be independently com- 
pressed and sent to a destination. It was shown in [1] that the 
achievable rate region is R x > H(X\Y), Ry > H(Y\X) 
and Rx + Ry > H(X,Y). Recently, several practical coding 
schemes have been designed for this problem based on the 
idea of using the syndrome of a linear block code as the 
compressed output [2], When Y = X (Be, where the sequence 
e is memoryless, low density parity check (LDPC) codes have 
been used to achieve performance close to the Slepian-Wolf 
limit [3]. 

In this paper we consider the case when Y = X © e, where 
X and Y are binary i.i.d. sequences and e is the output of a 
Hidden Markov Model (HMM). This problem has been studied 
before by Garcia-Frias et al [4] and Tian et al [5]. In their 
scheme, X is compressed to H(X) bits and transmitted. The 
encoder for Y ttansmits a portion of the source bits without 
compression to "synchronize" the HMM. The remaining bits 
are used as bit nodes in an LDPC code and the corresponding 
syndrome is transmitted. The decoder employs a message 
passing algorithm with messages being passed between the 
HMM nodes, the bit nodes and the check nodes. In [5] Tian 
et al, considered three HMM's and optimized the LDPC code 
ensemble using density evolution for these specific models. 
The resulting thresholds (the performance of an infinite length 
LDPC code) were 0.08-0.12 bits away from the Slepian Wolf 
limits. 

Here, we use a different approach. The main differences 
between the proposed work and that in [4], [5] are that - 
(i) a decision feedback scheme is used instead of iterating 
between the HMM model nodes and the LDPC decoder. This 
also reduces the decoding complexity significantly (ii) The 
LDPC codes used are optimized for a memoryless channel 
instead of being optimized for the channel with memory and, 
hence, the optimization is considerably simpler than in [5], 
(iii) The proposed scheme is similar to the scheme in [7] to 
find the capacity of the Gilbert-Elliott channel and is provably 
optimal asymptotically in the length. 

With the proposed scheme, for the models considered in 
[5] we are able to design codes that have thresholds within 
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Fig. 1. Equivalent Channel Coding Problem 

0.03 bits of the Slepian Wolf limits allowing for a distortion 
of le-5, which is considerably better than those in [5]. 

II. PROPOSED SYSTEM 

Consider two binary sources X and Y such that Y = X © e 
where Y is independent and uniformly distributed. Typical 
compression schemes to achieve a corner point in the Slepian 
Wolf region involve sending X using H(X) bits and sending 
the syndrome of Y corresponding to a linear code C using 
H(Y\X) bits. It can be shown that the problem of compression 
is equivalent to the problem of finding a capacity achieving 
linear code for the channel shown in Fig.H[2]- 

When e is memoryless, there are tools available to design 
LDPC codes that achieve capacity on this channel and, hence, 
achieve the Slepian-Wolf limit. In our case, e is the output 
of a HMM with three parameters S, P and /i. S defines the 
different states, P is an |5| x \S\ matrix with P^j representing 
the probability of ttansition from state Si to Sj and /i, |5| x 1, 
has elements fa which give P(e — 0|5,). The probability of 
e being or 1 depends only on the current state. We further 
assume that when no state information is available, the output 
of the HMM is equally likely to be zero or one. 

In [6] Narayanan et al use a Decision Feedback Equalization 
(DFE) based scheme for ISI channels that makes the channel 
appear memoryless to the LDPC decoder. We use the same 
technique to make the channel appear memoryless and then 
design codes for this "memoryless" channel. The encoding and 
decoding operations are explained below. 

A. Encoder 

We will describe a scheme to achieve a corner point of the 
Slepian Wolf coding region corresponding to Rx = H{X) 
and Ry — H(Y\X). The encoding process is shown in Fig. 
13 Let us assume that both sequences X and Y are first 
arranged in the form of L x N matrices X and Y. The (i,j) th 
element in Y is y(i-i)L+j- We will use Y^j to denote the 
(i,j) th element in Y and to denote the j th column of 
Y. The sequence X is compressed using an entropy coder 
to H{X) bits. For the models considered in this paper, the 
sequence X contains independent and uniformly disttibuted 
bits and, hence, no compression is needed for X. The first 
M columns in Y are transmitted without any compression 
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Fig. 2. Encoded Sequence 

(these are referred to as pilots). For each column yj, j > M 
in Y the syndrome, yj, corresponding to an (N, K) LDPC 
code is computed and conveyed to the receiver. When an 
LDPC code with N bit nodes and N — K check nodes is 
used, the syndrome yj is simply the check values when the 
bit nodes are set to yj. Therefore, the compressed sequence 
is given by Y comp = (yi, y 2 , • • • , yMjM+n "■ , Yl). The 
compression rate of this scheme is 

NM +(N -K){L-M) 
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B. Receiver 

The receiver has X and Y comp . Since the first M columns 
in Y are sent without any compression, the receiver has 
the first M columns of Y. Hence, the receiver can form 
the error values ej for the first M columns. From column 
M + 1 onwards, the receiver tries to recover ej using the 
following procedure. It first computes soft estimates of bits 
e^M+i by using the error values in the past M columns, i.e., 
ej,i,e i)2 , . . .,e itM by using 



7i,M+l = log 



P {&i,M+l — l|ei,Af, ej,Af-l) • ' • i e i,l) 



> e i,lJ 



(2) 



P{^i,M+l — 0\ei,M, ei,M-l, 

Note that the consecutive values of e from any row are the 
sequential outputs of the HMM and, hence, in Equation |2] 
the estimate for a particular bit is made only from the past 
bits in the same row. Since e is the output of a HMM, jij 
can be computed efficiently using the forward recursion of a 
BCJR algorithm. From the soft estimates of e< m+i> one can 
directly form soft estimates of I^m+i given by A^m+i since 
Y = X © e and X is available at the receiver. 

Now the LDPC decoder is run to decode ym+i by using 
Am+i as the soft output corresponding to yju+i an d yjf+i 
as the check values. With a suitably chosen LDPC code the 
receiver can recover yM+i - The whole process can be repeated 
to recover the next column and so on till all columns are 
decoded. For an LDPC code with finite length codewords, 



yM+i will fail to decode with some probability. This may 
cause error propagation within that block. 

III. Achievable Information Rate 

The LDPC decoder tries to decode bits Yij by using Ajj 
which can be considered as the output of a channel with input 
Yij. If L is made large the bits corresponding to a particular 
column are far apart in time (at least L time units apart) and 
therefore it can be assumed that they go through independent 
channels. That is, we can assume that for a given j, the channel 
between Yij — > Ajj and Y p ,j — > A p j are independent and 
identical for i ^ p. The capacity of this channel is given by 

C = H(Yi i j\Xi i j) - H(Yi t j\Xi ! j, Xij) 

= H(e it j) -H(eijY)ij) (3) 

The second equality in Eqn. is true since Y = X © e and, 
hence, H(Y\X) = H(e). Since j is the optimal estimate of 
ei,j given e^-i, • • • , e itj - M we have 

C = H(eij) — H(eij\eij-i, ■ ■ ■ ,eij-M) 

= 1 - H(e it j\ei t j-i, ■ ■ ■ , ejj_M) (4) 

If a capacity achieving code is used then the resulting 
compression when L ^> M is H{ei t M+x\e-i,Mi ' * ' i e i,i)- 
Note that the Slepian Wolf compression limit in this case is 
liniM^oo H(e it M+x\ei,M, •'• , ej,i). We can come arbitrarily 
close to the Slepian Wolf limits by making M large and using 
a capacity achieving code for the "memoryless" channel. This 
shows the optimality of this scheme for asymptotically large 
L and M. Note that there is a rate loss due to the first M 
columns being transmitted without compression, but that rate 
loss can be made arbitrarily small by choosing a large enough 
L. 

Although the arguments presented above show that this 
scheme is optimal as M — > 00, we do not require this. 
If we use a code of rate 1 — H(ei i j\ei t j-±, . . . , e^i) for 
the jth column, then we can obtain a compression rate of 
iX^- H X e i,jl e i,i-i>--->ei,i) which converges to H(e) = 
H{Y\X) from above as L — » 00 for any wide sense stationary 
process e. This solution however requires variable rate LDPC 
codes for the different columns and, hence, is not used in this 
paper. 

IV. Simulation Results 

We compare the performance of the proposed scheme with 
the scheme used in [5]. The HMM used in [5] has two 
states So and Si and is defined by four parameters P(Sq — > 
S ),P(Si -> Si),P(0|S ),P(l|Si). The models considered 
are 

Ml: (0.01, 0.065, 0.95, 0.925) 
M2: (0.97, 0.967, 0.93, 0.973) 
M3: (0.99, 0.989, 0.945, 0.9895) 

Note that the parameters in the model are chosen so that they 
satisfy P(e = 0) = 0.5. 

In Figure [5] we plot i?(ejvf+i|ejw, • • • , ei) as a function 
of M for the models. We observe that for these models the 
M required to come close to the Slepian Wolf limits is quite 
small. We use M = 4 for our simulations. 





t 
I 

% 






0.9 


\i" " 




w 






0.8 


_ ,\.\.V 







this loss can be reduced significantly by increasing L and by 
compressing the pilots. 
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Fig. 3. Compression Limit vs. M 



For the three models the pdf of jij conditioned on eij 
being a 1 and were computed through monte-carlo simu- 
lations. From this the distribution of Ajj conditioned on yij 
can be computed. Using this, an LDPC code ensemble was 
designed using density evolution and differential evolution. It 
was assumed that a Hamming distortion of 10 -5 is acceptable. 
Since the HMM is not symmetric, the pdf of 7^ conditioned 
on eij being a 1 or is not symmetric. That is, f li j (x\eij — 
1) 7^ f li .{—x\ei : j = 0), for some x. Hence the distribution 
of Xij is also not symmetric. For the density evolution we 
use the average of these pdf's similar to the approach in [8], 
where the correctness of this procedure is proved. Simulations 
were done with the designed LDPC codes of length 100000 
and L = 100. 2000 such blocks were simulated for each 
model. A different interleaver was used for each column to 
avoid repetition of error sequences. The results obtained are 
compared with those of [5] in Table HI The SW limit column 
shows the Slepian Wolf compression limit. The THEO column 
represents the threshold, which is the achievable compression 
rate with infinite length LDPC codes. 

For the DFE scheme simulations were also performed with 
N = 2000, L = 100 and M = 4. Codes designed for 
AWGN channel were used in these simulations. The bit filling 
algorithm [9] was used to reduce error floors. The results are 
also tabulated in Table |l] For each model 5000 blocks were 
simulated. The Hamming distortion observed was less than 
2e-7. Although the performance in this case seems to be far 
from the Slepian Wolf limits, it should be noted that this 
scheme is universal and does not require any optimizations 
specific to the HMM. Although beyond the scope of this 
paper, we wish to point out that for small L and finite 
lengths, simple improvements to the decoding algorithm can 
provide significant improvements in the compression rates. For 
example, allowing for decoding of a particular block using the 
pilots on both sides. 

The loss in rate due to the pilots in the DF Scheme is not 
included in Table H] If the pilots are sent without compression, 
then the compression rate would increase by 0.04. However, 



TABLE I 
Results 



Model 


SW Limit 


Tian et al[5] 




DF Scheme 


H(Y\X) 


THEO 


THEO 


N = 10 5 


N = 2000 


1 


0.515 


0.599 


0.546 


0.58 


0.69 


2 


0.448 


0.544 


0.476 


0.52 


0.62 


3 


0.278 


0.413 


0.305 


0.34 


0.45 



With L = 100 error propagation is a serious problem but it 
can be overcome by lowering the rate of the LDPC code. In 
our simulations, no error propagation was observed. 

V. Conclusion 

We proposed a low complexity decision feedback based 
scheme to compress multiterminal sources with hidden 
Markov correlations. The proposed scheme has thresholds just 
0.03 bits away from the Slepian Wolf limits and the simulated 
performance with designed LDPC codes of length 100000 is 
within 0.08 bits of the limits which is better than the thresholds 
of the scheme in [5]. 
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